High Scalability of HDFS using Distributed Namespace
نویسندگان
چکیده
In data intensive computing, Hadoop is widely used by organizations. The client applications of Hadoop require high availability and scalability of the system. Mostly, these applications are online and their data growth rate is unpredictable. The present Hadoop relies on secondary namenode for failover which slows down the performance of the system. Hadoop system’s scalability depends on the vertical scalability of namenode server. As the namespace of Hadoop distributed file system grows, it demands additional memory to cache. A namenode server does not have enough primary memory to cache the namespace, its performance and availability effects. A new Hadoop architecture has been proposed to address the issues of namenode scalability, single point of failure and availability of Hadoop. This approach is based on distribution of namespace using distributed hash tables. The growing size of namespace of HDFS is distributed into multiple name node servers. The proposed architecture of Hadoop is simulated by using the multiple name node servers. The name node are arranges in chord ring. This allows HDFS to scale up horizontally. The system provides decartelize managed approach for namespace distribution which gives consistent performance. The results of HDFS namespace to store 1 billion or above files are discussed in this research work. The proposed architecture has shown high availability and adapts to name node failure. General Terms Data intensive computing, Scalability, Failover, Availability
منابع مشابه
A Model-Based Namespace Metadata Benchmark for HDFS
Efficient namespace metadata management is increasingly important as next-generation storage systems are designed for peta and exascales. New schemes have been proposed; however, their evaluation has been insufficient due to a lack of an appropriate namespace metadata benchmark. We describe MimesisBench, a novel namespace metadata benchmark for next-generation storage systems, and demonstrate i...
متن کاملDNN: A Distributed NameNode Filesystem for Hadoop
The Hadoop Distributed File System (HDFS) is the distributed storage infrastructure for the Hadoop big-data analytics ecosystem. A single node, called the NameNode of HDFS stores the metadata of the entire file system and coordinates the file content placement and retrieval actions of the data storage subsystems, called DataNodes. However the single Na-meNode architecture has long been viewed a...
متن کاملCross-Partition Protocols in a Distributed File Service
distributed file system, distributed namespace, fault tolerance, Storage Area Network (SAN) A number of ongoing research projects follow a partition-based approach in order to achieve high scalability for access to the distributed storage service. These systems maintain a namespace that references objects distributed across multiple locations in the system. Typically, atomic commitment protocol...
متن کاملHopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases
Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS’ sing...
متن کاملA Novel Approach for Improving Security and Storage Efficiency on HDFS
Distributed file system for the storage of massive files have obvious advantages compared with the conventional file system. For instance, Hadoop Distributed File System (HDFS) implemented with commodity hardware has the advantages of low cost, high fault tolerance, scalability, etc. However, HDFS has the potential safety hazard due to the unencrypted data stored in Datanode, which may cause da...
متن کامل